Semi-supervised recursively partitioned mixture models for identifying cancer subtypes
نویسندگان
چکیده
MOTIVATION Patients with identical cancer diagnoses often progress differently. The disparity we see in disease progression and treatment response can be attributed to the idea that two histologically similar cancers may be completely different diseases on the molecular level. Methods for identifying cancer subtypes associated with patient survival have the capacity to be powerful instruments for understanding the biochemical processes that underlie disease progression as well as providing an initial step toward more personalized therapy for cancer patients. We propose a method called semi-supervised recursively partitioned mixture models (SS-RPMM) that utilizes array-based genetic and patient-level clinical data for finding cancer subtypes that are associated with patient survival. RESULTS In the proposed SS-RPMM, cancer subtypes are identified using a selected subset of genes that are associated with survival time. Since survival information is used in the gene selection step, this method is semi-supervised. Unlike other semi-supervised clustering classification methods, SS-RPMM does not require specification of the number of cancer subtypes, which is often unknown. In a simulation study, our proposed method compared favorably with other competing semi-supervised methods, including: semi-supervised clustering and supervised principal components analysis. Furthermore, an analysis of mesothelioma cancer data using SS-RPMM, revealed at least two distinct methylation profiles that are informative for survival. AVAILABILITY The analyses implemented in this article were carried out using R (http://www.r.project.org/). CONTACT [email protected]; [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Generalized mixture models, semi-supervised learning, and unknown class inference
In this paper, we discuss generalized mixture models and related semi-supervised learning methods, and show how they can be used to provide explicit methods for unknown class inference. After a brief description of standard mixture modeling and current model-based semi-supervised learning methods, we provide the generalization and discuss its computational implementation using three-stage expec...
متن کاملSemi-supervised internet network traffic classification using a Gaussian mixturemodel
With a dramatic increase in the number and variety of applications running over the internet, it is very important to be capable of dynamically identifying and classifying flows/traffic according to their network applications. Meanwhile, internet application classification is fundamental to numerous network activities. In this paper, we present a novel methodology for identifying different inte...
متن کاملA Note on the Descriptional Complexity of Semi-Conditional Grammars
The descriptional complexity of semi-conditional grammars is studied. A proof that every recursively enumerable language is generlji";'I":ff f iru*::?H:"ffi Xl,?li".ffi f iJ,,-,T?,ll,J"'"'n'"
متن کاملMulti-Instance Mixture Models and Semi-Supervised Learning
Multi-instance (MI) learning is a variant of supervised learning where labeled examples consist of bags (i.e. multi-sets) of feature vectors instead of just a single feature vector. Under standard assumptions, MI learning can be understood as a type of semisupervised learning (SSL). The difference between MI learning and SSL is that positive bag labels provide weak label information for the ins...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 26 20 شماره
صفحات -
تاریخ انتشار 2010